81 research outputs found

    Improving Phrap-Based Assembly of the Rat Using “Reliable” Overlaps

    Get PDF
    The assembly methods used for whole-genome shotgun (WGS) data have a major impact on the quality of resulting draft genomes. We present a novel algorithm to generate a set of “reliable” overlaps based on identifying repeat k-mers. To demonstrate the benefits of using reliable overlaps, we have created a version of the Phrap assembly program that uses only overlaps from a specific list. We call this version PhrapUMD. Integrating PhrapUMD and our “reliable-overlap” algorithm with the Baylor College of Medicine assembler, Atlas, we assemble the BACs from the Rattus norvegicus genome project. Starting with the same data as the Nov. 2002 Atlas assembly, we compare our results and the Atlas assembly to the 4.3 Mb of rat sequence in the 21 BACs that have been finished. Our version of the draft assembly of the 21 BACs increases the coverage of finished sequence from 93.4% to 96.3%, while simultaneously reducing the base error rate from 4.5 to 1.1 errors per 10,000 bases. There are a number of ways of assessing the relative merits of assemblies when the finished sequence is available. If one views the overall quality of an assembly as proportional to the inverse of the product of the error rate and sequence missed, then the assembly presented here is seven times better. The UMD Overlapper with options for reliable overlaps is available from the authors at http://www.genome.umd.edu. We also provide the changes to the Phrap source code enabling it to use only the reliable overlaps

    Phospholipase A2-activating protein is associated with a novel form of leukoencephalopathy

    Get PDF
    Leukoencephalopathies are a group of white matter disorders related to abnormal formation, maintenance, and turnover of myelin in the central nervous system. These disorders of the brain are categorized according to neuroradiological and pathophysiological criteria. Herein, we have identified a unique form of leukoencephalopathy in seven patients presenting at ages 2 to 4 months with progressive microcephaly, spastic quadriparesis, and global developmental delay. Clinical, metabolic, and imaging characterization of seven patients followed by homozygosity mapping and linkage analysis were performed. Next generation sequencing, bioinformatics, and segregation analyses followed, to determine a loss of function sequence variation in the phospholipase A2-activating protein encoding gene (PLAA). Expression and functional studies of the encoded protein were performed and included measurement of prostaglandin E2 and cytosolic phospholipase A2 activity in membrane fractions of fibroblasts derived from patients and healthy controls. Plaa-null mice were generated and prostaglandin E2 levels were measured in different tissues. The novel phenotype of our patients segregated with a homozygous loss-of-function sequence variant, causing the substitution of leucine at position 752 to phenylalanine, in PLAA, which causes disruption of the protein's ability to induce prostaglandin E2 and cytosolic phospholipase A2 synthesis in patients' fibroblasts. Plaa-null mice were perinatal lethal with reduced brain levels of prostaglandin E2 The non-functional phospholipase A2-activating protein and the associated neurological phenotype, reported herein for the first time, join other complex phospholipid defects that cause leukoencephalopathies in humans, emphasizing the importance of this axis in white matter development and maintenance

    Sequences, Annotation and Single Nucleotide Polymorphism of the Major Histocompatibility Complex in the Domestic Cat

    Get PDF
    Two sequences of major histocompatibility complex (MHC) regions in the domestic cat, 2.976 and 0.362 Mbps, which were separated by an ancient chromosome break (55–80 MYA) and followed by a chromosomal inversion were annotated in detail. Gene annotation of this MHC was completed and identified 183 possible coding regions, 147 human homologues, possible functional genes and 36 pseudo/unidentified genes) by GENSCAN and BLASTN, BLASTP RepeatMasker programs. The first region spans 2.976 Mbp sequence, which encodes six classical class II antigens (three DRA and three DRB antigens) lacking the functional DP, DQ regions, nine antigen processing molecules (DOA/DOB, DMA/DMB, TAPASIN, and LMP2/LMP7,TAP1/TAP2), 52 class III genes, nineteen class I genes/gene fragments (FLAI-A to FLAI-S). Three class I genes (FLAI-H, I-K, I-E) may encode functional classical class I antigens based on deduced amino acid sequence and promoter structure. The second region spans 0.362 Mbp sequence encoding no class I genes and 18 cross-species conserved genes, excluding class I, II and their functionally related/associated genes, namely framework genes, including three olfactory receptor genes. One previously identified feline endogenous retrovirus, a baboon retrovirus derived sequence (ECE1) and two new endogenous retrovirus sequences, similar to brown bat endogenous retrovirus (FERVmlu1, FERVmlu2) were found within a 140 Kbp interval in the middle of class I region. MHC SNPs were examined based on comparisons of this BAC sequence and MHC homozygous 1.9× WGS sequences and found that 11,654 SNPs in 2.84 Mbp (0.00411 SNP per bp), which is 2.4 times higher rate than average heterozygous region in the WGS (0.0017 SNP per bp genome), and slightly higher than the SNP rate observed in human MHC (0.00337 SNP per bp)

    A high-throughput splinkerette-PCR method for the isolation and sequencing of retroviral insertion sites

    No full text
    Insertional mutagens such as viruses and transposons are a useful tool for performing forward genetic screens in mice to discover cancer genes. These screens are most effective when performed using hundreds of mice, however until recently a major limitation to performing screens on this scale has been the cost effective isolation and sequencing of insertion sites. Here we present a method for the high-throughput isolation of insertion sites using a highly efficient splinkerette-PCR method coupled with capillary or 454 sequencing. This protocol includes a description of the procedure for DNA isolation, DNA digestion, linker or splinkerette ligation, primary and secondary PCR amplification, and sequencing. This method, which takes about 1 week to perform, has allowed us to isolate hundreds of thousands of insertion sites from mouse tumours and, unlike other methods, has been specifically optimised for the isolation of insertion sites generated with the murine leukaemia virus (MuLV), and can easily be performed in 96 well plate format for the efficient multiplex isolation of insertion sites

    Comparing De Novo Genome Assembly: The Long and Short of It

    Get PDF
    Recent advances in DNA sequencing technology and their focal role in Genome Wide Association Studies (GWAS) have rekindled a growing interest in the whole-genome sequence assembly (WGSA) problem, thereby, inundating the field with a plethora of new formalizations, algorithms, heuristics and implementations. And yet, scant attention has been paid to comparative assessments of these assemblers' quality and accuracy. No commonly accepted and standardized method for comparison exists yet. Even worse, widely used metrics to compare the assembled sequences emphasize only size, poorly capturing the contig quality and accuracy. This paper addresses these concerns: it highlights common anomalies in assembly accuracy through a rigorous study of several assemblers, compared under both standard metrics (N50, coverage, contig sizes, etc.) as well as a more comprehensive metric (Feature-Response Curves, FRC) that is introduced here; FRC transparently captures the trade-offs between contigs' quality against their sizes. For this purpose, most of the publicly available major sequence assemblers – both for low-coverage long (Sanger) and high-coverage short (Illumina) reads technologies – are compared. These assemblers are applied to microbial (Escherichia coli, Brucella, Wolbachia, Staphylococcus, Helicobacter) and partial human genome sequences (Chr. Y), using sequence reads of various read-lengths, coverages, accuracies, and with and without mate-pairs. It is hoped that, based on these evaluations, computational biologists will identify innovative sequence assembly paradigms, bioinformaticists will determine promising approaches for developing “next-generation” assemblers, and biotechnologists will formulate more meaningful design desiderata for sequencing technology platforms. A new software tool for computing the FRC metric has been developed and is available through the AMOS open-source consortium

    Mapping and sequencing of structural variation from eight human genomes

    Get PDF
    Genetic variation among individual humans occurs on many different scales, ranging from gross alterations in the human karyotype to single nucleotide changes. Here we explore variation on an intermediate scale - particularly insertions, deletions and inversions affecting from a few thousand to a few million base pairs. We employed a clone- based method to interrogate this intermediate structural variation in eight individuals of diverse geographic ancestry. Our analysis provides a comprehensive overview of the normal pattern of structural variation present in these genomes, refining the location of 1,695 structural variants. We find that 50% were seen in more than one individual and that nearly half lay outside regions of the genome previously described as structurally variant. We discover 525 new insertion sequences that are not present in the human reference genome and show that many of these are variable in copy number between individuals. Complete sequencing of 261 structural variants reveals considerable locus complexity and provides insights into the different mutational processes that have shaped the human genome. These data provide the first high- resolution sequence map of human structural variation - a standard for genotyping platforms and a prelude to future individual genome sequencing projects

    Reduced Neutrophil Count in People of African Descent Is Due To a Regulatory Variant in the Duffy Antigen Receptor for Chemokines Gene

    Get PDF
    Persistently low white blood cell count (WBC) and neutrophil count is a well-described phenomenon in persons of African ancestry, whose etiology remains unknown. We recently used admixture mapping to identify an approximately 1-megabase region on chromosome 1, where ancestry status (African or European) almost entirely accounted for the difference in WBC between African Americans and European Americans. To identify the specific genetic change responsible for this association, we analyzed genotype and phenotype data from 6,005 African Americans from the Jackson Heart Study (JHS), the Health, Aging and Body Composition (Health ABC) Study, and the Atherosclerosis Risk in Communities (ARIC) Study. We demonstrate that the causal variant must be at least 91% different in frequency between West Africans and European Americans. An excellent candidate is the Duffy Null polymorphism (SNP rs2814778 at chromosome 1q23.2), which is the only polymorphism in the region known to be so differentiated in frequency and is already known to protect against Plasmodium vivax malaria. We confirm that rs2814778 is predictive of WBC and neutrophil count in African Americans above beyond the previously described admixture association (P = 3.8×10−5), establishing a novel phenotype for this genetic variant

    The International HapMap Project

    Full text link
    Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/62838/1/nature02168.pd
    corecore